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mLE: NETWORK BASED CLASSIFIED INFORMATION SYSTEMS 
FIELD OF INVENTION 

This invention relates to netwrorfc based classified information systems, to methods of 
5 automatically building searchable databases of classified infonnation derived from web pages 
posted on a network, and, to web pages for use in such systems and methods. 

Th information systems and databases of most relevance to this invention are those which 
include classified product and service catalogues similar to the Yellow Pages telephone books, 
1 0 contact indexes similar to the White Pages telephone books, and/or subject indexes similar to 
Library catatogues. Such informatton systems and databases typically include sets of 
associated dassificatmn, contact and/or geographk: items of informatran. For convenience, 
classification, contact and/or geographic infomiation win be heremafler called CCG-data. 

15 The networks with which this invention is concerned are the worUwkJe public 
computer/communications network commonly known as the Internet and private networks - 
sometimes called intranets - which allow common access to markup documents on computers 
connected to the network. Markup documents are text files prepared using various markup 
languages such as HyperText Markup Language (HTML) and Extensible Markup Language 

20 PCML) which are implementations (or dialects) of the Standard Generalised Markup Language 
(SGML). The system of accesstt>le files on the Internet is called the Work! Wide Web (WWW) 
and the markup documents themselves are commonly called "web pages'. A web page is said 
to be 'posted' on a network when it is stored on computer-readable media of a host network 
computer as a file which is generally accessible to network users. A web page is transported 

25 from the host computer to a requesting computer through intennediate network computers as 
a computer-readable signal embodied In a carrier wave. Though this inventran is not Gmited to 
Internet based informatton systems, these terms are used for convenience. 

BACKGROUND TO THE INVENTION 

30 It has been estimated that there ara about 100 million web pages on the Internet and that the 
number is doubling every two yeais. Many of these pages include infomiation conceming 
commercially offered goods and services and often Ridude contect details. But the difficulty of 
tocating such informatton is increase fester than the growth in the number of web pages. 

35 To assist network useisk)cato web pages of mterest certain network seivKeprovklers create 
md xes (or databases) of the contents of web pages posted (stored on computer readable 
media so as to be generally accessible) on the networic and provkte 'search engines' to use 
the indexes. These indexes are often created autornatkaOy by the use of >web crawlers' whKh 
(i) interrogate cornputer afler computer on the network to kxate successive web pages and (a) 

40 index the words in each web page encountered against the network address (eg Intemet 
Protocol Address or IPA) and filing system path or universal resource tocator (URL) at which 
the web page is aooessible. Hereinafter the temis URL and URI (Unifbnn Resource ktentifier) 
are taken to be kientical in meaning and to signify networtc addresses and filing system paths. 
Usually, the indexes consist of a 1st of unk)ue words with each word having an associated list 

45 of URLs of the web pages wherein the word was found to occur during interrogatton. The URL 
senses as a 'hypeifink' which. 9 selected by a user/searcher, resuRs in the assodatad web 
page being automatically transmitted from the computer where it is posted on the network to 
the userfeearchei's computer where it may be displayed or otherwise processed. The sending 
and receiving of fites in this way is grsatfy assisted by user interface programs caDed "web 

50 browsers' (or rnore simply, •browsers') such as Netscape and Microsoft Internet Exptorer. 



Th search f r web pag s of interest using search ngines leaves much to be desired: 

• simple searches (those using a few k ywords in simple combinations) ft n yield far too 
many web page references (URLs) to permit them to be intenrogated one-by-one, 

5 • complex searches (those using many keywords and/or complex Boolean expressions) 
require conskJerable expertise to undertake, 

• even using optimum search criteria, many irrelevant web pages are referenced because of 
inconsistent use of terminology by those who author the original web pages, 

• even using optimum search criteria, many relevant pages are missed, again t>ecause of 
1 0 inconsistent use of b^mninology by web page authors, and 

• t)ecause items of informatton included in the body of web pages cannot be 'understood' or 
associated in useful ways by web crawlers; that is recognised as, say, a sumame, a street 
name, a geographic tocality, or type of goods or sendees and, say, a sumame strongly 
associated with a street name, a geographk: locality, or a type of goods or service. 

15 The result is that infonmatton provMed by search engines from databases which are 
automatically compiled using web crawlers is a very poor equivalent of the common YeUow 
Pages and White Pages directories which serve the telephone industry (though these 
directories are not of course, automatically compiled from web pages). 

20 In an attempt to improve the usefulness of automatically compiled networic databases, some 
search engine provkJers make use of information contained in URLs, such as the country code 
and top level domain name codes such as 'com*, 'edu', 'nef and *org' whk:h is sometimes used 
to signify the subject matter of web pages. It has been proposed to add more content 
classifying codes to URLs (eg, "chem* to signify chemical subject matter) to aitow specialised 

25 databases - natk>nal, commercial, chemk^al. etc - to be generated. However, this proposal 
has serk>us drawk)acks: 

• URLs are Internet addresses and it is in principle undesirable to confuse the address 
fundton of a URL with that of representing a Dst of web page classifk:atk)ns or contact 
details. 

30 • A URL is an ffiappropriate container of multiple web page ciassificatx)n codes and contact 
details because the length of the URL wouM cause it to become unwiekJy as an Internet 
address. 

• Indudtng in a URL classification codes drawn from a fist of thousands of codes wouM 
compromise the mnemonic quality of internet addresses such as "www.yeBowpages.com*. 

35 • There is substantial overiap in the subject matter contained in web pages having the 
various top level domain name codes. 

• There is no consensus on, or standard for. content classificatk)h codes in URLs. 

ArK>ther proposal to add content ciassificatk>n data to web pages has arisen from the wish to 
40 ktentify pages contairmg material that may be offensive to some viewers, or shoukJ not be 
accessed by mmors. The Platform for Internet Content Selectton (PICS) (see 
bnp'JN^fi^Mi3.0iglpMWyW4IP\CS and other documents at www.w3.org) is a web page 
ral^igs standard simiar in prinaple to the ratings systems for motx>n pictures. This system 
aBows page authors to 'intemaBy' self classify their pages through use of the "<meta...>' 
45 HTML etement ARematively, *exlemar PICS ratings of web pages may be obtained from 
ratings s^vioe provfclers accessed each time a URL is selected. In pradioe, the ratings senrioe 
provktefs have adopted very Smited range of web page ciassifkattk)ns. For xample, Ararat 
Software's Commercial Rating System (see httpiAMAAW.araratcom.ratings/araratlO.htmO 
provkJes just 5 categories of web pag content; commercial content technical/customer 
50 support ordering infbrmatton, downk>ading informatk>n and contact informatk)n. In other 



the SecBational SoftwaiB Advisoiy Council (f^-'^'^^^,^'^^^^^^ 
Vanoouve Webpages Rating Sennce (Mlp J/vancouver^pages.comAWP1 _(V 
5 ^^ones. None of the categories provide dassHicatlon of web P«9«« ^""^^^^J' 
product or subject with suffidenl spedficily to be useful ^^"J^^'^^SJ^L^ mq^ 
Ramer the catMories aie intended to prevent web browsers from d«P»ay^^^Pf9^ 
uSbte foToXuIar types of web browser users. Such rating systems are not attended to 

"b^t^ fo^t^e^S^^ of Yeoow or Wh«e pages »>^"^^^^^^ 

10 a^ are unsuitable for that purpose because they can notre^^ 

the ratings data may only be encoded in the <meta...> etenerit n the <h^^oj.a" ^ 
drastically nmiSng the type and usefulness of the data that can be encoded. 

Another orooosal for dassilying the content of web pages, the Uteta ConterJ Pra"^*^*' 

15 S^rsL Kir^S-^Tappte-Commui^ "^"^"lilST^^mS^ 
classified and fte classification data to be held in a separate norvJTML ^late^J*^^**^ 
^d^exlftncf Storing date in non^fTMLencodeddocumentswhich describes the wntentj^ 
do^^is a technical and eoonomfe barrier to the adoption by seardj 

20 cSn'SVelloworWhte'^esIitedateb^^ 

textAitmO because date stored according to the MCF proposal s not stored m HTML encoded 

webpages. 

The -Electronic Business Card". vCard. (see "vCard The Electronic Bi^n^^jCanf 
25 2T vSttConsortium Specification. Sept 18. 1996 or flpm.intemK:.netr«iten^t-drafte^^^ 
^:^^S2^1.SWnon^ date file (MIME Content Ty^ o^ ^^"^^ 

m WT^rd -texta-i^ard^ conteining conted ^^^^^^.^^^^^^r^^S^I 
White Pages entry which can be exchanged on a nelworic us»ig S»npte ^ra^^oto^ 
SMTpiorusinaHTTP Itcanbeassociatedwithawebpageby useof aURLmttww^^ 

30 SST'rSr^Tv^n, formation (eg - »«f=2«r^r:r£^S^ 

yCaid</a>^ Version 21 vCard standard date file fennat (published 18 September 1^) 

S i ^n^^ marv rtams of contert 

SSl^s that, where possibte. there should t« c^^^ 

names to HTML •<inpul>- element attribute names (eg vCard prop^ name -TITLE n^psto 
35 rmT-^name^litte^TTheiiitefi^ 

^In^SC^pastingU 

daS^^fSmimilL encoded web pages because date stored accord^ig to the VCard 
proposal is not stored in HTML encoded web pages. 

^ The inclusion of dassifiod Wbrmation in separate documente ^^^J^^^^^J^^ 
vCards) has the disadvantage that there is necessarily f "P««^*>2?I ?SJ^ 
SJSLnofmodilicaBonsbOb-oent^ 

tedMia to atow a person who has accessed a web page usmg an HT»IL c^ 

45 to SS;^:^^^ ^ ca^ng up the ^^^^ZZ^TJ^ X 
nf MMh naoss to bo dassified. web page contextual information would nave 10 oe 

3^isadvaiiSr» that non+m«. documente such as vCards conte« no 
i^SSrSSXSnte to be displayed. In the display of HTML documente tliepos*or. 
50 SUSs^lwIroff the text and other elemente of ^he document are of great snportence. The 



1 1 1^""^'' ^ '° ""^'Sged ortinally organised fields is inflexible. For 
example, mulbple distances of xt nded narts f the address are not possible Also 
« ntifi^ of names, addresses and telephone numbers and so forth are insufficiently 

5 

^nh?o"r ^ ^"^"^ ^"^^ '"^ USA) proposal, known as the 

Dublin Core proposes to classifying scholarly web pages by subject (topic of the worK or 
kej^vords that describe the content of the work), title, author. pubRsher. other agent date 
object type (genre of the object such as home page, novel, poem etc), forni. kJentifier. source* 

^Z^^"^"^^ ^"^ ^"^'^S^ temporaO (S 

http7/Wvm.ock:.org:5()4h/-weiboiTitml^ta.htm and other documents at www.octeorg) This 

^■^ITl ^"^"^ classifications. It also does not 

.ndude contartdetaib. Names such as that of the author are not specified in sufficient detaU to 
1«; ^Jlr^^ such as which is the author's first and last names. The proposal specifies 
15 that the details are encoded using the <meta...> element in the <head> of weTpaoMThe 
proposal B unsiiited to the automated creation of YeUow or White pages like datab^'from 
web pages because the proposal does not piovMe for dassifk^ation of web pages and does 
notprovrie adequate contact details. Further, the use of keywords for describing the content 

20 a^ "sua jMndex«i on every word of their content and most often the key words would ^ 
be a dupHcabon of words already contained in the document 

!?®^ ^ proposed to use the Dewey Decimal System (see 
h^itorcrsrt.octe.oip:61()9/evaLdc.html and htlp://orc.,sch.octe.org:6109/bintro.html) to rank 
25 tectronw documents against a Dewey Decimal subject dassificatton. The proposal suggests 

classifteatton codes to documents during 
."^^^r* "«t spe<% the exact nature of the assignmem 

^11" -mpted that the codes are stored separately from the documents. The^posal 
admits that such automated dassifnatkn is less satisfactory than human classification The 

'"^"T. ^551°' ^^"^ HKe^SS fS^ 

««*page» because the accuracy of dassrfkation is inadequate, does not provkle for incius»n 

^J^^ r.i^.'^ .y?^'^^ ^ <*°~ "o* Provide for industen of contact 
wBo page is computationally expensive. 

35 

(se® Pafl® 23 the www.w3.org document -drafl-ietf-Wmkipecv^ 

»SSr^ *°J1^ "^"^ Sheets. St;;e sheets pZle by whi*^ 

40 ^l^^T^ to suit the needs of different dasses of browser 

SSS^-H^f ?T *° 2 header that ads as 

^fi,^^^T^^jT^ "^"^^ for hoWing goods and sennces 

H^^*^^ earlier standards provkled the HTML elements •<person>- and -<address>" 
Wtdonot specify the fbmi of the content or method of vafidating the content of those 

oo fOlowed by firet name. Similarfy. different conventions exist for writing addresses Similar 



M su» mey are of MSB use n the automafc complalion of seanSiable databa^ 
5 SS'Zi*'- ">"''soirt*rt,«M>xnit.h»rt) was developed to ex^ 

Ofrauree. n»ny useful databases of the Yellow Pages or While Paoes Wpe are made 

^Tftjf cannot tie automalicaly geneisted by scanning web pages using web cnmilefs 
STO Ifiere IS no adequate nie*anism to relate enial addiwses to^ na^irf™.^^ 
«~on.andlh.ir,ft,,cont«,d«ailswhichnay.lso^htl2^^^ 

2S OBJECTIVES OF THE MVENTNM 

30 

«[»n«^«enlial obMhea aw to provide meth^ 

35 

compilation (using oawters) of databases cc^Si^l^ 
the "tomatcscann«gofmany8uchpagesposledonanet«.»oi1e °^«>««~ctedby 

40 OUTUNEOFTHEMVENnON 

S 'STlSS^iMJStSi^^ that l^hV useful da.at«ses can te auton«tica^r 
«n«JiL ftSJ^!2L^^^ **** P**^ a "«t«»* if one or more HTML 

""P*«8d data are also eim»wt8d.Aocoid«gh^^^ 
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one or more iteir<s which provkJ the web page author with control over how the CCG-data is 
displayed by a browser. 

HTML Onduding version 2 and version 3) and XML are evolving applications (sul>-5ets orj 
5 dialects) of ISO Standard 8879 1986 known as Standard Generalised Markup Language] 
(SGML). HTML, in large part is a language used to describe how text (unstructured data) andf 
graphks is to be fonnatied for display. The HTML language consisb of a finite number olo 
'elements' (for example; *<BR>' where *BR' is the element name, also called the tag name/ 
whk:h may contain 'attributes' (for example; '<DL COMPACT>* where 'COMPACr is ah 

1 0 attribute named "COMPACr) and may contain values associated with attributes (for exampid; 
'<FONT SIZE--M>' where -t-i is the attribute value of the attribute named 'SIZE^. XML is a 
language used to describe structured data. The XML language is simBarty composed of 
elements, attributes and values with a simiiar syntax to HTML but unlike HTML the element 
names which may be used are not restricted and the meanoig of the XML data may be 

15 interpreted in any convenient manner. While the XML language is mute about how data 
descra)ed by XML is to be formatted for display, the data may be used by computer programs 
for any purpose induding description of how XML coded data is displayed. However, due to its 
historic importance in connection with web pages, the tenn 'HTML* is herein used to refer to al 
mariojp languages whnh are subsets or complete sets of the SGML language, in partnular. 

20 the tenn "HTML encoded CCG phrase* and the anonymous tenn "CCG phrase' are herein 
used to refer to CCC^-data encoded in a subset or complete set of the SGML language. 
Herein, a "web page* is a document adapted to be or actually accessible through a networtc 
and encoded in a suteet or complete set of the SGML language. 

25 For convenience. CCG items in HTML encoded CCG phrases, whether they are syntactkaOy 
represented as elements or as attrftHites. wiB be referred to hereinafter as CCG attributes. 

A CCG phrase inchides at least one of the fbltowing Uentifiable fypes of CCG-data attributes: 

• industry, product. senrioe.a!^or8ut)jectclassificatk)n8. 

30 • contact categories, contact pefl8on(s) and/or organisatk}n(s) names, titles or 
as so ciations, contact detaib including physical and postal addresses, telephone and 
fax numbers, email and Internet or network addresses or locations. pubSc keys, and 

• geographic tocatkmdetais. 

35 A CCG phrase may also include any of the foHowing ktentifiable types of CCG control 
attriHJtes: 

• database control atbttHJtes tu indicate which parts of the data are to be used to 
update databases, and 

• display control atbflxites to indicate how browsers are to display the data. 

40 

By virtue of ocaming in the same CCG phrase, a phiraGty of CC<>-data attributes are 
ass ociated wtth each other. 

By virtue of their oocunenoe in the same CCG phrase. CCG-data attributes are idententified as 
45 a set of ass oc ia te d attributes. However the degree of association between attributes can be 
oniroled by the induskm in the phrase of database control attributes. 

The start and end of CCG phrases shoukJ be identifiable to deariy distinguish these phrases 
from other data. T identity the beginning and end of a CCG phrase, at least one HTML 
50 eienientshouki have a CCG specffie HTML element name or CCG specific atbftMte name or 
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CCG specific value. Each CCG attribute may consist, with or with ut ther incidental 
characters, f a CCG attribute name and/or a CCG valu or values. Preferably, each CCG 
phrase is contained in the •<body>" of the web page. 

5 Two examples of a CCG specific HTML element are: "<CCG ...>" or "<CCG ... />" or 
"<CCG>...</CCG>*. (Where a CCG phrase is coded in XML, the elements •<XML>" and 
•'</XML>" may also be needed at the start and end of the CCG phrase.) A less satisfactofy 
example is: •<!-CCG where the characters "CCG' after HTML comment element name 
are used to signify that the comment contains CCG-data. An example of the use of a CCG 
10 specific attribute name is: "<START CCG>'.»"<END CCG>'. An example of the use of a CCG 
specific value is: '<START TYPE=CCGy..."<END TyPE='CCG'>*. Obviously, other 
character strings could be substituted for the element name, element attribute name or 
element attribute value "CCG" string of the examples. 

15 The codes •<CCG ...>• and "<CCG ... />* are compatible with most HTML specifications, but 
being non-standard HTML, most web browsers do not display any text or attributes (eg 
PQ=*AQD") within the angle brackets and ">'. These codes are preferred where display of 
the CCG data is not required and compatibility with older browsers is requrad (eg CCG 
phrases containing only classification values). 

20 

From one aspect, therefore, the invention comprises a web page for posting on a networic. the 
web page being characterised by the inclusion of at least one CCG phrase in the "<body>* of 
the page, the CCG phrase being such that the CCG attributes contained therein are 
accessible and identifiable by (i) HTML compliant editors and/or (ii) HTML compliant web 
25 crawlers for the automatic constnjction of databases of classified information, and/or (iii) HTML 
compliant browsers for display on the computer screens of networic users. 

From another aspect the invention comprises a method of constnjcting web pages of the 
above described type. The web pages may be constructed on digital computers using simple 

30 text editors such as Moosoft Windows Notepad, or preferably, purpose built human controOed 
editors or automated composing programs which embody knowledge of HTML and CCG 
syntax and grammar. Which ever process is used, CCG attributes are selected and inserted, 
modified* deleted and/or organised to form a vafid CCG phrases in HTML encoded documents 
and the documents are posted on computer readable storage devices of computers connected 

35 to a computer network so that the documents are generaly availat)le to computers on the 
network. 

From another aspect the inventkxi comprises a method of populating a datebase with CC(9- 
data extracted from web pages. Web pages posted on a networic are successively retrieved by 

40 a digBal computer program (eg: a web oawleO and CCG phrases contained therein are 
klentified and at least some of the CCG attrSHJtes found within the CCG phrases are extracted. 
The CCG attrftxite names are used to determine the type of date in the associated values. 
Generaly the CCG attrSMJtes of interest are those relating to class i fication, contact and 
geographic date and datebase update controls while the attrtoutes of little or no of interest in 

^ ^ database updating are those relatbig to display controls. Of course, the CCG^late 

extracted need only be that relevant to the particular datebase being updated. For example, 
one database may have been designed to index only web page c la s si fications and URLs whfle 
another database rnay have been designed to Mex only contact detefe^ Datebases also differ 
in their internal representation of date and means of associating date. For example, some use 
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•flat file" tables, others use pointers to data to create n twork associations while others use 
hashing and buckets. 

The conventk>nal nomenclature diff rs conskterably between different types of database. 
5 Depending on the partKuiar database nomenclature, data of the same type Is sakl to be stored 
in table columns. fiekJs. attributes and properties. The tenns column and fiekJ are somewhat 
related to the physkal representatran of the data in files while attribute and property is more 
related to the k)gKal representatkm of data. To avoki confuston. with the terms *HTML 
attribute". "CCG attribute" or just "attribute", hereinafter a database property means both a type 
10 of data stored in the database and a place in the database where data of the same type is 
stored. Database properties are refened to by a name Ciproperty name") or simBar reference 
and contain vahjes. For example, a database property wvith the name "City name" and whfch 
contains values whkii are all the names of cities may be defined as a "City name" type 
database property. 

15 

WhKhever style of database is used, it is preferred that the database update program relate 
the CCG attrftMites to corresponding database properties osed by the database update 
process so that the database property values are updated with CCG values in a manner whKh 
presenres the distinctness, content and meaning of the CCG values and. preferably, presenres 
20 the CCG vakie assodatmns expressed in the CCG phrase as sets of associated database 
property values of different types. 

in some cases, it is desired to know the address of the web page from WhKh the CCG vahies 
were extracted. For example, the purpose of buMng a database might be to aDow searching 
25 of the database by web page classificatkw to provkie a ist URLs of web pages or URLs of 
portnns of web pages whKh contain matching CCG d a ssificatwiis. The URLs couU then be 

inserted n an HTML docuriient and transmitted to a web browser as a Sst of references to web 
pages matching a search expressksn. In that example, assoctatmg the URL of a web page or 

the URL of a portmn of a web page with the CCG vakies extracted from the same web page or 
30 web page portkmsknportant and the Ul^ or means of reconstructing it rmist be avaiable and 
suppled to the database update process, bi one style of database, the vahies of the same 
type are heU separate rows ki a oohmin (property) of a database table, and ponlers hekJ m 

another oohnm (property) are associated wflh the vahies by sharing the same table row. The 
table row constitutes a set of associated property vahies. Each pointer ponts to a bucket 

35 (btock off data) contanng a est of URLs or pomters to URLs heUn a separate bucket or table, 
h) another style off database, values of diflbrem types are heU ki differed tables together wi^ 
awtnumber. pointer or 

mernbefsof the same set In one variatkm. the vahies off set members are prefixed with a code 
ndkating the type off vahie and al vahies are heU n the same cohimn off a table. If the 

40 pwpose off the database is to hold contact data, recording the web page URL m the database 
might not be lequked attwugh iff the URL is not present n the database, updatmg changes in 
the CCG contact delais contained withh a web page is more difficult Off course, one 
^yfy inay be used to record al types off CCG vahies contakied m web pages and 

assodale w9h each other any and al vahies extracted from the same web page or even ffrom 
45 otherweb pages. 

From another aspect the kiventnn comprises a method of searchkig the d?tat>aOTii 
oonstnided as ouHned above. These databases may be used for a varied of searchng 
purposes. For example, to find web page URLs by uskig the association off web page URLs 
50 with industry, sen/ice. produd or subject classification or a person's or organisation's n 
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address or geographic location values or any combination th reof. In another example, th 
databases may be used to find the contact detaOs for people r organisations by narrie r 
location of industiy. service, proauct or web page subject type and so forth by usmg th 
association between items of th contact details in the database without having to retriev web 
5 pages associated with the contact details. 

M re particularly, the searching method involves finding URL references, or finding sets of 
associated database property values, from databases containing CCGKfata. The method 
indudii^ steps of parsing a query phrase received from a computer network to extract query 

10 relational expressions and, from each expression, deriving a query field name, query relational 
operator and query value, detennining the type of the query field by reference to its name, 
relating the query field to a conesponding database property according to type and locating 
CCG-data database property values in the database property which return a true value when 
tested against the query value using the query relational operator. Finally, the URL references 

15 or the sets of property values associated with the so located CCG-data database property 
values are extracted. 

Database queries are usually expressed in a query language in the form of a phrase or 
sentence. In query by example style enquiry systems, the user types values into input fields on 

20 a form and a program extracts the input values and uses the values to automatically compose 
a query phrase or sentence. There are many existing examples of query languages used in 
connection with databases. Generally, they consist of relational expressions (eg Field=Value). 
logical expressions and grouping of relational and logical expressions by means such as 
parentheses. They may also contain sorting and output formatting expressions. Often 

25 abbreviated notation is used in the expressions such as leaving out field names or relational 
operators which are then inferred from the value in the expression or implied by default In an 
enquiry the nature and fbnnat of the output may also be implied, such as a list of URLs of web 
pages or a list of contact details. Whatever is the mechanism of any particular database, the 
query expression needs to be parsed and fields in the query expression, explicit, deteult. 

30 implied or inferred, need be related to database properties of similar type. In some styles of 
database enquiry the query expression is evaluated against each row of a table or record of a 
file to find roiMs or records (ie a set of associated property values) %»fhich match the query 

expression. In other styles, sub-sets of the values of the properties are selected according to 
the iriterpretation of relational expressions in the query expression and the sub-sets are 
35 combined according to logical and grouping expressions in the query to find the sets of 
associated property values which match the query expression. Often, to mate logical 

operations which combine the selected sub-sets more efficient it is not the values which are 
seteded but pointers to the values (eg Table name and table row) or unique toys (eg URLs or 
pointers to URLs) assodated with the values. For example, the AND logical operator is often 
40 used to combine two lists so that only values or pointere or toys common to both lists are 

IbumJ in the combined list UsuaBy. the query produces a result W which is then provided to 
other processes. For example, a list of URLs of web pages is processed to produce an 
attractively fbmiatted HTML encoded document containing the URLs and is sent to a web 
browsertoaOow an enqukerto retrieve interestingt web pages. In another example, the con^ 
45 detais associated In the database with each value or pointer in the result ist are retrieved from 
the database and presented as a report in the form of an HTML encoded document and is 
sent to a web browser for vimring. 

From another aspect the invention comprises a method of displaying CCG-data contained in 
50 COG phrases within web pages which are displayed by a web browser executing nadigital 
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computer. White a web page is loading or has loaded in a web browser, the web bro««er 
parses the w b page and displays the t xt (or data) of the web page on a display device 
connected to the computer. When the web browser parser encounters CCG phrases, the weo 
browser may display the CCG^lata (element and/or attribute names (or translations of element 

5 and/or attribute names) and/or values) in a number of browser specific virays. For ewmple. the 
web browser may by default not display any CCG^ata. display all CCG-data not d«play any 
CCG-data until a CCG display control attribute explicitly states that subsequent data should be 
displayed or display all CCG-data until a CCG display control attribute explicitly stet^ that 
subsequent data should not be displayed. The web browser may also use CGA display 

10 controls specifying the size, font position and so forth to alter the display of the CCG-data. 

DESCRIPTION OF EXAMPLES 

Having indicated the nature of the present invention, examples or embodiments thereof wui 
now be described by way of illustration only. 

15 

Example 1: HTML Svntax Suitable for ReDr ^««^ntinq a CCG Phrase 
The following is an example of HTML element syntax suitable for representing CCG phrases in 
which a control (e.g. "SHOW-) may be "good until countemianded' and thus apply to more 
than one field: 

20 <CCG HREF="urr . . 

{{NAME=1aber | ID="identifier_code'} &| {LANG=nanguage_code & 

CLASS="Class_name") 

{ 

{SET.SEPARATOR} &| 
25 {INDEX|NOINDEX}&i 
{SHOW I HIDE} &| 

p(POS="hori2ontal_position_number^ &| 
{YPOS="veiticalj30sition_numberl &| 
{NEWUNE}&| 
30 {AUGN=cente | left | right | justify} &| 

{SIZE=l+/-11|2|3|4|5|6|7}&| 
{C0L0R=7»nggbb' | 'colour_name"} &| 
{FACE="type face_name"}&| 

{BLINK &| BOLD &| UNDERLINE &| ITALIC &| STRIKE} &| 
35 {SUBSCRIPT | SUPERSCRIPT} &| 

{CLEAR{-lefl|right|all}} 
{NORMAL} &| 

{{{CONTACT &| COPYRIGHT &| DEVELOPER) &| 
{PERSONAL &| BUSINESS &| ASSOCIATION) &| 
40 {attribute names'attribute.value(s)'} 

) 

wherel^th elfipsis impTies optional repetition of the braced ff T) items; the braces are 
45 used to group items and are not CCG syntactic elem nts; (and) implies items must occur 
together T (or) implies only on item must occur and "ap (and/or) impliec any including none 
of the items may appear togeth r. 

Using the syntax of thfe example, each CCG phrase is represented as an HTML etem nt ttje 
50 lement name being "CCG" and the CCG^ata (eg attribute_name= atlnbute.value ) and CCG 
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controls (eg SIZE=+1) are represented as attributes of the HTML lement. Som of th 
attributes (eg SIZE) having xplidt valu s (eg +1) and some attributes have implied values 
d pending on the presence or absence in a CCG phrase (eg wh n th attribute BUSINESS ts 
present it has the implied valu ofTru and the implied valu of False when absent). 

5 . 
R presentation in XML syntax requires, at most, only a simple translation. All the rtems. such 
as "NORMAL" and "attribute_name" may remain unchanged as attributes of the element 
named "CCG" (eg <CCG size=+1/>). However, when a CCG phrase is encoded in XML. it is 
prefened that the items are represented as XML elements. For example attribute "SIZE=+1" 

10 can be represented as element •<size>+1</size>" or •<si2e value=+1/>" and "NORMAL" can 
be represented as "<nonnal/>. 

In this example, the attributes. ID. LANG and CLASS take their meanings from HTML 3.0. The 
"uiT in HREF="urr or may be a link with or without deslinatton anchor labels. For example the 

15 URL http:/Awww.w3.org/doc8.html does not contain a destination anchor label (or Wenlifier) 
while httpy/Www.w3.org/docs.html#searching does contain the destination anchor label 
"Wsearching" which is Intended refer to an anchor in docs.html such as <A 
NAME="searchlng">...</A>. There is some confusion in various HTML standards 
documentation about the distinctton between the expression NAME="laber and the expression 

20 IDs'identlfier.code". For most practical purposes the two expressions have the same functton 
or meaning: to uniquely Wentify within a document a position in or portion of that document 

Database control attributes: 

"Set separator' indicates the end of associatton between preceding and foltowing data other 
25 thanlhrough the weaker mutual association with the same CCG phrase or web page: the data 
are divided into sets. "Index | Noindex" indicates that the fonowing data are / are not to be 
ind xed by a web cravifler. These attributes have an implied attribute value of True' if present 
in and 'False' when absent from a CCG phrase. 

30 Display control attributes: 

"Show I Hide" indicates that a browser shouW show / not show the foHowing data. Xpos and 
Ypos indteate the positton (for example In pixel or physical units) on the browser screen where 
the data is to be displayed. "Newline* may be used m addifon or as an alternative method of 
pladng text on a browser screen. "ATign" indicates the posittoning of data on a browser screen 

35 relative to the cursor position set by "Xpos". "Ypos" or "Newline". "Size". "Cotour' and "Face" 
indkalBS the size, cotour and type fece or font of the foDowing data when displayed on an 
browser screen. "BBnlT. "BoW". "Underline'. "Itelfc". "Strike". "SuperscripT and "SubscripT 
indKales ttrat the foBowing data shoukJ be displayed bfinking. boU. underiined. italicised, stmck 
through, superscripted or subscripted. "Clear* indkates that the browser screen in the region 

40 where data wiD be displayed shouM be deared to background before displaying the foltowing 
data. "Noimar indicates the data is to be displayed without the "BGnk'. .... "Clear 
characteristics. The display controls which consist of an attribute name without an expOdt value 
have an impfied value of True' when present and 'False' when absent 

45 CCG-date attributes: 

"Contact &| Copyright &| Devetoper* indnates that the foltowing CCG-date refers to deteis for 
a person or oiganisalion and/or to the copyright owner and/or to the HTML or web page 
devetoper. "Personal &| Business &| Assodatton" hdtoates that the foltowing data refers to 
detals for a person and/or business and/or assodatton. The previous CCG-date attributes 

50 have an implied attribute vakie of Tnie' if present in a CCG phrase or set and 'False' when 
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absent from a CCG phrase or set. The atlribute.name could be standard CCG attribut names 

or synonyms of standard CCG attribute names r abbreviations of CCG attribute names which 

refer to the following types of CCG attribute values where square brackets T and T sum)und 

sugg sted attribute names: 
5 • industry or service or product or subject classifications and sub<lassific^ 

• classification nanne [CM]. 

• classification codes [CC]. 

• display only text [TEXT]. 

• contact 
10 • person: 

• courtesy title [PNC]. 

• first given name [PNG], 

• other given names [PNO], 

• femtly name [PNF]. 
15 • name suffix [PNS]. 

• quarrfications PQ], 

• associations [PA]. 

• contact person trtie PT], 

• contact person role [PR], 
20 • organisation: 

• name[ON]t 

• unit[OU]. 

• identifier [OID]. 

• physical or post or deHvery address: 

25 • type [ATI (= •PHYSICAL" &| TOSTOFFICE- &| -POSTAL" &| "DELIVERY") 

• post office box number [AP#) 

• post office name (APN] 

• room or suite or office or unit or fiat or apartment name &| number [AB#]. 

• floor name &| number [ABF]. 
30 • buildhg name [ABN]. 

• lane or street or road or highway number [AS#]. 

• lane or street or road or highway name [AS^q. 

• suburb or town or city name [AC^Q. 

• region or state or territory or province name [ARN], 
35 • post code [APC]. 

• country or nation nawe [ANN], 

• telephone: 

• type [TT] (= TREFERRED" &i "VOICE" &| "MOBILE" &| "CAR" &| -MESSAGE" 
ifPAGER" &| TACSIMILE" &t "MODEM" &| "ISDN" &| "VIDEO") 

40 • nation or ooimtry code number [TC#]. 

• trunk access number [TT#|, 

• area code number [TA#|, 

• local numt)erri1iq. 

• emafl: 

45 • type [ET1(= -INTERNET I (other)). 

• maaertEM), 

• address [EA], 

• Internet address: 

• ur1[IURL]. 
50 date&thne: 
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• date & time from [DTF]. 

• date&tim topTT), 

• weekday from PTWF], 

• weekday to [DTWTl, 

5 • weekday time from [DTWFT]. 

• weekday time to [DTWrn, 

• time zone [DTZ]. 

• brand name [BN]. 

• pubfic key. 

10 • keytype[KT|. 

• keylKl, 

• geographical: 

• tocatton units [GLU]. 

• tocation[GL]. 

15 • serviced region units [GLRU]. 

• serviced region [GLR]. 

Suggested attribute name [CN] is the name of an attribute associated with the attribute value 
conteining "dassificatton name' type date. For example, the [CM] attribute value couU be the 

20 name of a proprietery or national or international or other industry ciassifk:atk>n stendard such 
as the Australian and New Zealand Stendard Industry Classification or "ANZSIC* for short or 
the U.S. Bureau of the Census Industrial Classrficattons (USBCIC). The associated 
das^fication codes [CC] attribute value couki contein the codes and/or descripttons of the 
codes of the named stendard with or without modificattons, deletions or extensk>ns. For 

25 example: CN="ANZSIC" CC="61;Road transporf or CN=aiSBCIC" CC="581:Hardware store". 
Sennce dassificattons such as the lntematk>nal Stendard Classificatk)n of Occupations couM 
be used. For example: CN^^ISCOO* CC=''4430 Auctkmeer^ Product dassificattons such as the 
Hanmonised Commodity Descriptton And Coding System coukJ be used. For example: 
CN'^SC CCs'8411;Turt)ojets, turbo-propellers & other gas turbines; parte thereof For 

30 sutajed dassificatkms. Dewey Decimal, and/or Universal Decimal and/or Library of Cor^ress 
and/or BBss and/or Cok>n Classification couM be used. For example: CN=DDC' 
CC==''577.699:Sea shore ecotogy* The irKhJston of subjed dassificattons provMes a very 
simple, slraightfbrward method of classifying the subject matter of an HTML document whk:h 
couM be attractive to commercially oriented copyright owners. 

35 

The text ([TEXTD, person ([PNC] - [PRD. organisatkm ([ON] - [DID]), physical or post or 
defiveiy address QAT] * (ANN]), telephone (^TT] - [TL#D, email address ([ET] - [EA]) and 
internet address pURL] are intended to be associated with each other in the obvk)us manner. 
Date & time(s) ([DTF] - PT^ are intended to indkate the times at which the address and/or 
40 telephone and/or emal win be senrioed bf the associated per8on(s) and/or organi8atk)n(s). 
The brand name ([BN]) attibute is intended to hokJ commercial brand names. Pubfic key ([KT] 

* PQ) is intended to hdd public encryptk>n keys for secure oommunicatton with the oontad 
person ororganisatk>n. 

45 The geographical k>catk>n [GL] couid be a latitude and kmgitude (eg 
E148D3ri2.5',S36D40',09.6' or E148.5201 .836.6693 r -148.5201 ,-36.6693), or a Universal 
GrU Reference (eg 55FV/364402) oc* other global natk>nal. regk>r)al or k)cal k)catk>n reference 
wBh unBs as specffied [GLU], which is typed in or obtained by pointing to a digiteBy encoded 
map or other methods. In more populated regk)ns of some countries such as the U.S., street 

50 addresses stkI post codes are associated with a moderately accurate geographic k)catx>n ar>d 
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can be used to interpolate geographic location data where geographic location data is not 
xplicitly stated in th CCG-data. Using a universally recognised cod such as latitud and 
longitude has advantages when used with international mediums like the Intemet 
Geographical location is intended to t>e associated with a post, defivery address or physical 
5 address such as place of business or residence. A CCG compliant browser could use this 
reference to display a map centred on that geographic location. The purpose of the 
geographical location data is to allow browser users to specify search engine search criteria 
which will result in the search engine selecting only those Intemet accessible documents which 
provide detafls about providers which are within a specified region. The serviced region [GLR] 
10 is intended to indicate the preferred area of operation of providers expressed in terms of 
serviced region untts [GLRU]. A radial distance (eg in kilometres) or aRemate means of 
expressing an area of interest around a geographic point such as po^gons, are envisaged. 

It is envisaged that the CCG attiibute.value coukJ be composed of more than one value 
1 5 (actually sub-value) wherein specific characters or character strings separate indrvkJual values. 

While spedfk: instances of element names and types have been given in ttiis example, of 
m re importance is the type of data and type controls over the display and indexing of the 
data. As an alternative to the preferred immediately foDowing example where the CCG-data is 
20 lumped togettier under the HTML element named *CCG', certain elements of the data, for 
example the classification data. coukJ be lumped under separate HTML elements with 
distinctty different names thereby separating CCG classification data from CCG contact data. 
However, ttiis is not prefened because the strengtti of assodation between the two types of 
data is weakened. 

25 

Example 2: Cbssiffcation of Portten of a Web Paoe. 

Where it is desired to classify a portion of a web page, such as a paragraph about a product, 
simple CCG-data may be used in conjunction witt) the syntax of Examplel . For example: 
<A NAME=-Radk>s'>AM-FM radk) receivers: </A> 
30 <CCG HREF=^«tedtos-> 

CN^ANZSKT 

CC=^3.34 J8;Electrical equipment • radk) receivers AM" 
CC="E23.34.79;Electrical equipment - radk) receivers FM" 

<a:cg> 

35 We won*t be beaten on the price of these high quality receivers .... 

In this exami^, ttie CCG prase appears after the related anchor (<A NAME-...</A>). 
However, whfle tuch proximity visually provkies an obvk)us association between the anchor 
and related CCG phrase, it is mtended ttiat CCG phrase oontaki^g the attribute HREF related 
to a specific anchor couU appear anywhere wittiin the body of a web page and remain related 

40 to the named anchor. The CCG phrase containing the attrft)ute HREF coukJ appear in a 
separate document and ttiereby relate the CCG*data to Vhe entire document or to a named 
anchor idthough. as previously noted, coordinating separate documents can be problematic. In 
the absence of the HREF and NAME attributes, it is also intended that the CCG^iata apply to 
the whole web page. 

45 

Examole 3 Classificatton of Portion of a Web Paoe using XML Svntax 

Using XML syntax and simiar attrbute names to ttiose of Example 2 the HTML fragment of 

Example 2 may be rawrflten as: 

<A NAME=-Radk)s'>AM-FM radk) receivers: </A> 
50 <XML> 
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<CCG> 

<HREF>"#Radlos'</HREF> 
<CN>"ANZSIC"<CN> 

<CO"E23.34.78;Electfical equipment - radio receivers AM"</CC> 
5 <CC>"E23.34.79;Electrical equipment - radio receivers FM'</CC> 

</CCG> 
</XML> 

We wont be beaten on the price of ttiese high quality receivers .... 
This example demonstrates that the translation of CCG-data from HTML to XML (and the 
10 reverse) involves simple syntactical and grammatical translations. Of course, the resulting 
HTML and XML. while "wefl formed" might rwt be tecognised or. if recognised, might not be 
understood by some parsers. 

Example 4: Construcfino a Web Paoe ContaminQ CCG-data 

15 As an example, a web page developer, Alioe Jarroeson. is preparing an advertisement for a 
local eiectrictan John Waiiams, trading as Kelso Electrical, who wants to advertise on the web 
for business within 30 kilometres from his ofiioe located at 18 Raglan Street. Kelso. New South 
Wales. ABoe uses a graphical user interface web page authoring tool capable of creating and 
modifying web pages containing HTML (and XML) CCG phrases by accepting inputs from a 

20 user. The tool executes on a digital computer having input devices such as a keyboard, 
mouse, light pen and touch pad. display devices such as a CRT. LED anays. fiqukl crystal 
arrays and computer-readable media such as magnetk: and optical disks, memory anays. 
magnetic tape and the like. 

25 The authoring tool also embodies knowledge of the content and stmcture of CCG phrases 
such as the attribute names. vaHd ranges and sets of associated attribute values, the nonnal 
rderof the attributes in the CCG phrase and Merdependendes between attribute values. Tlie 
tool provkles a window where web pages may be viewed in layout (browser) mode and 
another window where the HTML code may be viewed in editing mode. The tool also provkies 

30 means of inserting, deleting, modifying and organising HTML elements, changing font size, 
fsne and ook>ur and 80 forth. The tool provktes means for the user to buiki CCG phrases by 
using input devtoes to select an edit control representing vark)usM>08<>^ CCG attributes from 
a list whkd) the tool then inserts in the body of a web page together with, when not already 
present. HTML code indkative of the start and end of a CCG phrase. The user then types in 

35 the value in the attrAxrte. Simflaity. the tool provkies means of converting web page text to 
CCG attrftxites. Using input devk»s, the user selects the text to be converted to a CCG 
attiibulB then setecte an edit oonlrol ftom a list: the tool then inserts the HTML code ne^ 
to encode ttie text as a CCG attribute. However, these semi-manual methods of creating and 
modifying CCG phrases are ineffkaent and error prone. The tod also provkies a button, whkrfi 

40 can be activated by u^ input devices, for access to CCG phrase edflingfunctmns. The CCG 
editirv functens consist of a means of extracting the CCG values from existing CCG phrases 
in the web pa^ being edited, fbrms for entering and modifying the extracted CCG vahies. a 
layout view browser window for altering how the CCG-date displays (posilkxi. font size. face. 
cokMir. boU. normal, hkling or showing and so fbith). a date view browser window to alter 

45 which CCG-date values are to be indexed or not indoced in search engine datebases. and a 
rneans of deleting existmg (X:Q phrases from web pages and inserting new or changed CCG 
phrases in web pages. Editing cursors marking the current tocatkm at which text and/or date 
may be inserted, deleted or modified are provkJed in each window and fomi. 
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'd^'ng~;;^n''S;iS„^?r'^^^^^ ^"'"'"^ P*^™^- Clicking the CCG 

pS^^Sts toTcG Itl-r*""""^ """"^ " ^ ^''P*^^- The fbm, contains 
S^jX CCG attnbute names and associated data input fields related to the CCG 

5 S^^inST*? '^'^^ '"^"^ ^ CCGKlata. The iSs S 

not view, the edit cursor is not overa CCG phrase (and can 

to me ' raoeTh: w'^ "'^'r "° ''»"^>- ^ ^ dassLti^ 
ZZl JT-. 5/^ • ^""^'"^ P^y^' contact address, phone and fax 

Tr^Tl^^rrt^ r''"^'*' and his post Office business «n^ 

^dresses are entered into the fonns using a keyt>oard and mouse The devefooer Alice 

^qSL ^ ^ S select address blocks (eg physkal and post offtoe) for editing. Logfc 
^^^^r "^^^ mterdependendS. InpS 

15 ?e^ ^!^K^!l^°:* ""^ m the iayStrt bro^r. np^ 

^S^^iSfi^^^ "^'^'^ highlighted as a bkx* and moved into position 

^^^^ ^ is then used to check whSdS 

20 Sl^^l?^ «"rt«>l values) are to be indexed Input 

22^^[?n^l^r? ? *^ Then another button is cficked whi^ 

S^?^^^:^fr."i1f "l^r values and Inserts the CCG phrase in the 
Pageatthetocatonpointedtomthewebpagelayoutbrwvserwindow. 

<XML> 
<CCG> 
30 <lHDEXf> 
<HIOEA> 

<CN>AN2SIC</CN> 

<CC>D36.1 1 .45;Electrkal contractors - reskJential<^C> 
<CC>D36.1 1 .46:Electrteal contractors - industriaK/CC> 
<SHOWf> 

<COMTACT/> <COPYRIGHT/> 
<BUSINESS^ 
<XPOS>50</XPOS> 
<YPOS>320</YPOS> 
40 •<AUGN>cehtre</ALIGN> 
<SIZE>3</SiZE> 
<COLOR>biack</COLOR> 
<FACE>Time8 New Roman</FACE> 
<BOLO;^ 

45 <CLEAR>aB<CLEAR> 

<TEXT>Contact :</TEXJ> 

<PNOMr^NO 

<PNG>John^NG> 

<PNF>WaBams</PNF> 
50 <PQ>AIE</PQ> 
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<PA?'ARUC</PA?- 
<NEWLINE/> 

<PT>Managing Director</PT> 
<NEWUNE> 

<ON>Keiso Electrical Pty. Ltd.</ON> 
<NEWLINe> 
<NORMAL/> <rTALICA> 
<SIZE>-2</SIZE> 

<TEXT>NSW License 45678C<TEXT> 

<NEWLINE> 

<NORMAL/> <BOLD/> 

<SIZE>+2</SIZE> 

^T>PHYSICAL</AT> 

<AS#>18<AS#> 

<ASN>Raglan Street<ASN> 

<NEWLINEA> 

<ACN>Kelso</CAN> 

<NEWUNE/> 

^N>NSW<ARN> 

<NEWLINE/> 

<HIDE> 

<ANN>AiJStialia</ANN> 

<NEWLINE/> 

<SHOWf> 

<TEXT>Phone:</TEXT> 

<TT>PREFERRED ; VOICE ; MESSAGE<nT> 

<HIDE> 

<TC#>ei<n'o 

<SHOW/> 

<TT#>0<nT#> 

<TA#>63<n"A#> 

<TL#>456-7828<mJ> 

<TEXT> Fax:</TEXT> 

<TT>FACSIMILE<nT> 

<HIDE/> 

«^c#>6i</rc#> 

<SHOW/> 

<TT#>0<nT#> 

<TA#>63</TA#> 

<TL#>456-7829<ni#> 

<NEWLINEA> 

<ET>INTERNET</ET> 

<EA>johiw»@firefly.com.au<EA> 

<TEXT> </TEXT> 

<6LU>UtLong</GLU> 

<GL>="33.3978S:148.5679E</GL> 

<GLRU>Km</GLRU> 

<GLR>30 </GLR> 

<set_separator/> 
<xp6s>25o</xpos> 
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<YPOS>32o<nrpos> 

<NEWLINE> 
<NEWUNB> 

<TEXT>Or write to us at :</TEXT> 
5 <NEWLiNE> 

<ON>Kelso Electrical Pty. Ltd.</ON> 

<NEWLINE/> 

<AT>POSTOFFICE</AT> 

<^AP#>P.O. Box 187</AP#> 
10 <NEWUNE/>> 

<APN>Sunny Comer<APN> 

<TEXT><TEXT> 

<AP02795</APO 

<NEWLINE> 
15 <HlOE> 

<ANN>Australia</ANN> 

<SET.SEPARATOR> 

<HIDE> 

<DEVELOPERA> 
20 <BUSINESS/^ 

<PNG>Alice<ff>NG> 

<PNF>JamiesorK/PNF> 

<ET>INTERNET</ET> 

<EA>alitain@firBfly.oom.au</EA> 
25 <IURL>http:/^vvvw.firefiy.oorn.au/~aiiain/<IURL> 
<CCG> 
<aML> 

In the web page layout browser wimlow the CCG-data display 
30 Contact: Orwritetousat 

Mr John WaSams. AE. ARUC. 
Managmg Director 

Kelso Electrical Pty. Ud. Kelso Electrical Pty Ud 

NSW License 45678C P.O. Box 187 

35 18 Raglan Street Sunny Comer 2795 

Kelso 
NSW 

Phone:063456-7828 Fax:063456-7829 
Emai: iohnwfiMirBflv.coin.au Mao 

40 

Having encoded the web page in this way. ABoe then posts tt on the storage device of a digital 
computer oo n nedBd to the Intemet from where it can be retrieved through the internet using 
the URL 'htipi^vww.flrelly.coin.au^^johnw/index.htmr 

45 ExaireilB4: Const m ctina a Database from Web Pages ContaminQ CCG-data 

During a louline sweep of Internet connected web page senters, a web crawler (or rDt>ot) 
operating on a server named '^ocg.search.com* executing on an Intemet connected digitel 
computer discovers the URL 'htlpiMwww.firefly.com.auHohnw/index.htmr in a document ft 
had previousiy retrieved through the Intemet The web crawler decides that the URL matches 

50 rs sstodion criteria because the URL oontams the suffix "JitmT. The web crawler then 
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successfully retrieves the document by extracting from the URL the address of the computer 
hosting th document addressing and serKling a message Onduding the address f the web 
crawler) requesting the web pag through the network to the web page host computer using 
TCP/IP protocol, the host computer then reads the document addresses and sends the 

5 document to the web crawler using TCP/IP protocol, the web crawler then waiting until it has 
received all parts of the web page from the host computer before proceeding. It inspects the 
contents of the document and finds that it matches the additional selection crteria that it is an 
HTML encoded document The web crawler program, depending on its state and logic, then 
parses the document strips out and saves some or all of the URLs m the document for future 

10 examination. The web crawler program then passes the document togettier with the URL of 
the document through a network communicatk)ns channel to an indexing program executir^g 
on a diRierent computer. The indexirtg computer has database updating software which 
manipulates a database stored on computer-readable media. 

15 The indexing program parses the document from first to last character, indexing some of the 
meta data in the <head> of the document and the words in the text of the document with 
respect tu the document URL. In the database of this example, unique words extracted from 
the documents abeady indexed are heM in separate rows of a colurnn of a data table and 
in another column of the same table on each row is an associated pointer to the first bucket or 

20 bkxdc of URLs of documents containing the word associated with the pomter. As new words 
are found, the new word is added as a new row in the word column of the table, a new bucket 
is created, the URL of the document containing the new word is inserted into the tnicket and a 
pointer to the new bucket is written in the new row pointer oohimn. When the same word is 
found in another document the row in the table of the word is found, the pointer is retrieved 

25 from the table, the bucket pointed to by the pointer is retrieved and the UFU. of the other 
document is inserted ni the bucket Where a bucket becomes fiiD of URLs, a new bucket is 
created and a pointer to the new bucket for hoMing additk>nal URLs is placed in the ful bucket 
Delelxx) of words and URLs of changed or no tonger existffig documents is also provkted for. 

30 In additkMi to indexing words extracted from the text of the document the indexing program 
also indexes the CCG-data in the document as weB as indexing words found in the CCG-data. 
When the parser finds HTML eleniem -<CML>' in the docu^^ 

mode and swrtches out of that mode when *</XML> is found . When the element '-<CCG>' is 
found, the parser switches into the CCG parsing mode and switches out of that mode when 
35 •</CCG>' is found. 

The example database has a CCG-data attrftxite name to database property name 
correspondence table to show the relattonship between the CCG-data attrbute names and the 
database tables and columns (properties) where the CCG-data attrOnite values are to be 

40 stored in the database as datat)ase property values. The database property values and 
as s ociated URLs are stored in much the same way as for words extracted from text as 
outined above. However, CCG contact data, for example, whkii consists of several district 
CC&dala attributes which are related (eg street name. dty). is stored in a database table 
having a column (property) related to each distinct (XG contact attrftHite name and each 

45 separate CCG contact data set (eg person's name, address, telephone number) as separated 
by'<(XX>'.*<SEr.SEPARATOR>'and*</CCO* is heU in a separate row in the table. The 
vakies stored in each row are conskJered to be a set of associated property vakies of dM^^ 
types. 
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The indexing program, during parsing the document of Example 2 above, encounters the 
*<CCG>' element and enters the CCG parsing mode. The parser knows to ignore display 
control attributes and to con»der database control elements in the CCG phrase. The example 
ind Xing program opts to index aH other CCG-data contained in the attrft>ute values untfl 
5 xpfiddy instnicted not to index the attribute values by encountering the "<NOINDEX/>" 
database control element and then to recommence indexing when the '<\HDEX/>' database 
control element is encountered. 

Taking each CCG-data attribute name and associated attribute value(s) in succession, the 

10 example indodng program uses the conespondenoe table to translate Vne CCG-data attrftHJte 
name to the database table and column (property) names where the CCG-data attrttHJte 
value(s) are to be stored as database property valueCs). The indexing program may opt to 
translate the CCG^iata attribute values to database property vahies by. for example, 
converting character strmgs of digits to binary encoded decimal representation, the string 

15 Tare* to a soigle bit representation and the fte. The indexing program then adds or updates 
the database property value(s), using the database table and column (property) names (or 
sonOar references) obtained bf translatMn, in much the same manner as outlined above for the 
update of the database using words retracted from the document text, mdudng associating 
the data to the document URL where desired. Where the (XG-data contains a 'HREP 

20 attribute (or similar), the URL associated with the other CC(9-data is a URL taken from the 
"HREP attra)ute value or composed of the document URL and the "Hf^P attribute vahje if 
the attrflMite value is a parte! or relative URL Some CCG attributes, such as *<BUSINESSA> 
tave orily an implied vahiei of true if the attritya is piis^ and false if the attrftnite is absent, 
the •<SEL.SEPARATOR>". '<CCG>' and '<iCCG>' resetting such vahies to false. However. 

25 where atli^ute vaiue(s) associated with different attribute names are still related, such as a 
person's name and a street name, the related values of different types are stored on the same 
row of the same database table but in a different column (da tab a se property) to presence the 
reiationshi). *<SETjSEPARATOR/>' limits the degree of relatedness between, for example, a 
person's narne occurring before the separator and a street narne ocoirring after the separator. 

30 Using the example document and using the same database column (property) names as used 
for the CCG-data attrftxJte names a portion of the table constructed database table wouM )ook 





PNC 


PNG 


PNF 


PQ 


PA 


PT 




URL 






















Mr 


John 


WKams 


AIE 


ARUC 


ManagmgDractor 




(poima) 





















35 DifficuRies not highBghted by this example are the need to handle properties having multiple 
values of the sanw type, "spiuse roM 

with extremely large numt)efs of rows. For example, the CCG-data of this example could have 
contained rnuRiple values of personal quaificationsr^ representthistypeof data using 
a 2 dimensional table database system, the datat>ase would be "normalised' so that the 

40 multiple values were stored in a separate table and keys or pointers were used to relate the 
relate the tems in the two tables. Numerous alternate database systems, for example those 
based on key hashing and data buckets, or tagging data vakies with prrfbces or suffixes 
related to the type of data vakie may be used. Prefierat>ly. however, whatever database 
system is used, it shouU present the a ssociatton s off CCG-data items present in the CCG 

45 phrases. 
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Because the geographic location data was missing from the postal address of the CCG-data in 
the example document but a post code was present the indexing program ffifened the 
geographic location from the post code. 

5 

Example 6: Finding Web Paoe References Usinq a CCG Datat>ase 

As an example, Kevin Rok>son ives in Sydney but owns and has rented out a house in 
Bathurst He wants to use the web to find some electricians t>ased in the general Bathurst 
region (not onV in Bathurst City) to contact for estimating the cost of modify 
10 house. He uses his web browser to open the web page 
nittpi/www.ausfine.com.au/Mebjsearch.htmr containing AusUne's search engine web page 
search criteria «put form encoded ustfig the HTML •<f6mrP^ 

The search criteria input fonm contains several Input fields including those labeled "Sennce 
15 classification'. ICey words'. *CityiSuburh^own\ Xkmntf/. 1.at/Long' and 'Radius'. The torn 
also displays a button labelled "Map' to aBow latitude and longitude to be selected by pointing 
to map images. The word 'electrician' is typed Into the 'Senrice classification' field, "house 
wiring' into the Keywords* field. 'BathursT into the 'City/Subufb/Town' field and '1 (T into the 
field "Radius'. The country 'Australia' was already shcNving in the country field because the 
20 web page server had received cookie data from the browser indicatvig that that was the 
country used when the browser last used the web page. The 'sutHnit search" button on the 
web page was cficked. The browser transmitted a message using TCP/IP protocol to tho 
AusUne server oorrtaining the input field values encoded in the header of the nn^^ 

25 After a short delay, the search result HTM. encoded web page was returned. Clicking on the 
"Service classiBcatk>n' input fieU drop down fist box to check the classific a tk>ns used in the 
search revealed three items! 

• Electrical contraKtors * residential 

• Electrical contractors -industrial 

The search engine attached to the server obtained those dassificatkxis by using word 
stemnring and searching the text of the service dassificatnns heU in Vs database. The 
Lat/Long fieU contained the vahie '33.3856S;148.5743E' which the search engine obtained 
by tooking up the latitude and kmgitude of the town "BathursT m 
35 database. Cicking on the Hap' button retrieved a web page having the image of a map 
centred on the town of Bathurst and showing the area 20 Km around it The search engine 
obtaned ttie inap by rnakirig a request to another bilernet conriecM 
latitude, k>ngitude and radius. Cicking on the tmwser "Back" button returned to the rearch 
resuilspage. 

40 

The search resuRs cont ai ned 8 titles, brief descriptkxts and URLs inckjding a fcU^rencB 
contai ni ng the URL 'MlpJAmrw.fire^.com.auHohnw/^^ Retrieving each in turn 
revealed that al were wel focused according to the search criteria being related to el^^ 
electrical contractors and engineers in ttie Bathurst area. The search ef>gine obtained these 
45 references to web pages byz 

• searc hi n g database off senrioe dassi ficatton tMes with words stemming from 
'eledrictan' which resulted in three senrice dassificatkxi codes. 

• se ar ching Ws datatme using the three service das si fic a tton codes to Main an 
irrterrnediatB fist of URLs of web pages containing those CCG ood^ 
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• searching its database for the twkeywoids to obtain an intend 
web pages containing those words in the web page text 

• Searching iTs database to find the latitude and longitude of BathurstAustiaia. 

• searching iTs database to obtain an intermediate ist of web pages which contain 
5 latitude and longitude data lying within 10 Km of the latitude and longitude of 

Bathufst. Australia. 

• producing as a result list, a list of URLs whidi are oomnxm to aB the mtennediatelB^ 

• obtaining from iTs database the title and brief description ofthe web pages. 

• fomwtting the tides, descriptions and URU into an HTML encoded report 
10 • transmitting the report to the enquiring web browser. 

Example 7: Rndino Co ntact Details Using a CCG DatahaM 

As an example. Jim Jones of Jones and Sons wants to send a recaO notice about a faulty 
batch of UV stabilised electrical power cable to all Electrical contractors and Electrical 
15 wholesalefsinAustraiiawhohaveemaiaddrBSse8.Heuseshiswebbrowsertoopentheweb 
pag 'httpifWww.ausline.com.au/contacLsearch.htmr contanng AusLine's seaxh engine 
contact search criteria input fbnii encoded using the HTML "<fbnn>" element 

The search criteria input tonn contains several input fields including those labelled "Senrice 
20 dassificalion-. "Counti/ and -Output fbrmaf. The word •«ectric^ is typed Mo the "Senrice 
dassificalion' field, the word •Australia" is typed into the "CourtOf field and the Tabular - 
Name & Emar option in the "Chrtput fbrmaT drop down ist box is selected. The "Submit 
search" button on the web page is ciicfced. The browser transmits a message using TCP/IP 

protocol to the AusLine sewer containing the input field values encoded h the header of the 
25 message. 



After a short delay, the search result HTML encoded web page is retomed. Cfiddng on the 
"Seivice dassificalion' input field drop down Bst box to check the classificatfons used in the 

search revealed too many dassificalions for the resutt to be suflidenlV focused.^ 
30 four dassificattons were selected from the ist 

• Electric cable -ducting systems 

• Electrical contractors >resklential 

• Electrical contractors -industrial 

• Electrical wholesalers 

35 and the -Submit seardi" button is pressed again to refine the search. 

The seardi results conteined 3.473 names and associated emai addresses and URU to fuR 
contad detals. Jim saved the search result page on his computer so that he couM use his 
emai program to send the recal notne to each emai address in the ist The emai address 
40 'johnw@firefly.oom.aiJ" was included in the ist 

The search engine obtained these references to web pages by: 

• searching iTs database using the fbur service classification titles whN:h resulted in four 
senwce da s si ficatfan codes. 

45 • searching iTs database using the four service das si ficatwn codes to obtein an 

intermediate ist of database primary keys of database table rows contakiing those 

senrioe dassffication codes in the datal>a8e Service dassificalion attrSHite. 

• seardiing database using the country name -Ausiraia" to obtain an interrnediate 

of database primary keys of database tabte rows containing that word bi the 
50 database Country atbiMJte. 
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. producing as a result list a list of database primary keys which are common to both 

the intermediate lists, » -maii 

. obtaining from iTs database using the result list the values of the name and email 

5 . ST^' HTML <table> element to fomiat the name values, email values and full 

detail URLs into an HTML encoded report 
• transmitting the report to the enquiring web browvser. 

This example relates to finding sets of associated database contact values ^^o'^'^J^jf "9 
10 references to web pages. However, finding other sets of associated database values such as 
^ of associated industry classification values and geographic location values might also be 
useful for some purposes. 

Thus it -IS appreciated that the afore stated goals, advantages and objecti>res achfe>«d by 
15 the teachings herein. In particular it is seen that unlike the pnor attdfi<=«"^^;*f^^ 
Y now pag^ and Whrte pages databases and the fike may be automabcally ~"strurted from 
HTML ncoded web pages. Additionally the database entries may be automatrally linked to 
specific web pages and portions of web pages altowing convenient methods of indexing ^ 
^ and se^ce catalogues and the like. It is also appreciated *at simpter rne^^^^^ 
20 oonslmcling databases suited to a variety of other uses such as industry and subject 
directories are also provkled. 

From the foregoing teachings and with the knowledge of those skilled in the art. ft s apparent 
that other modificatfons and adaptations of tiie invention will become aPPa'ervL For e^pte. 
25 th mettiod steps disctosed and claimed herein may be practiced m a ^^nety of d^rent 
orders. CCG^ata may take on a variety of different fbmr« viittim ttie rn^nmg of me ^ims^ 
Thus, it is our intention to include witiiin the scope of the claims not only tiie -nven^n Itere^ 
mbik Bd by ttie language of the claims but to include all such modifications and adaptations 
whk4i may come to ttiose skilled in the art. 
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What I claim \s: 

1. An HTML encoded web pag mbodied on a computer-readabie medium, said web 
page comprising at least one HTML encoded CCG phrase, each CCG phrase 
5 comprising: 

a) HTML code indicative of the start of a CCG phrase, 

b) at least one CCG-data attribute, and 

c) HTML code indicative of the end of a CCG phrase. 



10 2. An HTML encoded web page embodied on a computeReadable medium, said web 
page comprising at least one HTML encoded CCG phrase, each CCG phrase 
comprising: 

a) HTML code indicative of the start of a CCG phrase, 

b) at least two CC&data attributes. 

15 c) at least one database control attribute separating said CCG<lata attn'butes into at 
least two sets of CCG attributes, and 
d) HTML code Indicative of the end of a CCG phrase. 

3. An HTML encoded web page emt)odied on a computer-readabie medium, said web 
20 page comprising at least one HTML encoded CCG phrase, each CCG phrase 

comprising: 

a) HTML code indicative of the start of a CCG phrase, 

b) at least one CCG-data attributes. 

c) at least one attribute of database control attributes, display control attributes; and 
25 d) HTML code indicative of the end of a CCG phrase. 

4. A computer implemented method of building a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of: 

a) displaying a web page on a computer display device, 
30 b) displaying an edit cursor Micating a character position on said display device and 
a oorrespondir^ character position in said w^ page, said edit cursor being 
positionable within the display off said web ps^e by use of computer input devices, 
c) separately displaying on said computer dteplay device a set of edit controls 
representing CCG-data attrttnite types. 
35 d) positioning said edit cursor withm said display of said web page using said input 
devices, 

) selecting an edit control from said set of edit controls using said input devices, 

f) relating said selected edit control to a corresponding CCG-data attnl)ute name. 

g) constructing a CCG-data attrtt)ut0 character string comprising a character string 
40 representing said attrSnrte name and another character string representing an 

empty CCG-data value. 

h) if ttie said edit cursor is positioned outside a CCG phrase, 

0 inserting into said web page, at the character position indicated by said edit 
cursor, a start character string oomprishg HTML code Micative of ttie start 
45 of a CCG phrase, 

n) insertmg into said web page, immediately after the nd of said start 
character string, an end character string comprising HTML cod indicative of 
the end of a CCG phrase, and 

m) positioning said edit cursor between said start and end character strings. 
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i) inserting said CCG-data attribute character string into said w b pag at the 

character position indicated by said edit cursor, 
j) positioning said edit cursor at the character position in said web page of the CCG- 

data value of said inserted CCG-data attnl>ute character string, 
k) inputting characters using a keyboard, 

I) inserting sakl input characters into sakJ web page at the character position 
indk:ated by saki edit cursor, thereby converting saki empty CCG-data value to a 
non-empty CCGKiata value, and 

m) writing sakJ web page on computer-readable media. 

A computer implemented method of buikJing a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of. 

a) displaying a web page on a computer display devtoe, 

b) displaying a start edit cursor and an end edit cursor on saki display devk», each 
sakl edit cursors indicating a character positk)n on sakl display device and a 
conesponding character po^n in sakl web page, sakl edit cursors being 
positk>nable within the display of sakl web page by use of computer input devk:es, 

c) separately displaying on sakl computer display devtoe a set of edit control 
representing CCG-data attribute types, 

d) selecting a string of web page characters on sakl display device using sakl input 
devtees to positk>n sakl start edit cursor to indicate the start sakl string of web 
page characters and sakl end edit cursor to indk^ate the end of sakl string of web 
page characters. 

) selecting an edit control from sakl set of edit controls using sakl input devk^es, 

f) relating sakl selected CCG-data control to a corresponding CC&data attribute 
name, 

g) constructing a CCG-data attribute character sbing comprising a character string 
representing sakl attn*bute name and another character string representing a CCG- 
data value containing saki string of web page characters, 

h) deleting sakl string of web page characters from saki wen page, 

i) if the sakl start edit cursor is positioned outskle a CCG phrase. 

0 inserting into sakl web page, at the character positton indk:ated by sakl start 

edit cursor, a start character string comprising HTML code indicative of the 

start of a CCG phrase. 
9) inserting into sakl web page, immediately after the end of sakl start 

character string, an end character string comprising HTML code indicative of 

the end of a CCG phrase, and 
m) positkHiing sakl start edit cursor between sakl start and end character 

strings, 

j) inserting sakl CC&data attribute character string into sakl web page at the 
character positton indicated by sakl start edit cursor, thereby converting sakl string 
of web page characters to a CCG-data attra>ute value contained within a CCG- 
data attribute contamed within CCG-phrase, and 

k) writkig sakl web page n computer-readable media. 

A computer implemented method of buiMkig a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of: 

a) displaying a CCG-data input form n a computer display devk», 

b) inputting CCGKlata values into fieMs of sakl data input fonm using computer input 
devices. 
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c) inserting into the body of a w b page a start character string comprising HTML 
code indicative of th start of a CCG phrase. 

d) insetting into said web pag body immediately after the nd f said start character 
string an nd character string comprismg HTML code ndicative of the nd of a 
CCG phrase, 

) extracting successive field values from said data entry fbnn together with related 
field value type information, 

f) relating the type of each extracted field value to a conesponding CCG-data 
attribute name, 

g) constructing a CCG-data attribute character string comprising a character string 
representing sakl attril)ute name and another character string representing said 
field value. 

h) inserting said CCG-data attrft)ute character string into said web page between said 
start and end character strings. 

i) writing said web page on computer-readable media. 

A computer implemented method of buPding a database wrhich comprises sets of 
associated property values wherein each set mdudes at least two property values of 
different types, the property values being any of classification values, contact values, 
geographic location values, hereinafter collectively lefened to as CCG-data. the method 
comprising the steps of: 

a) retrieving successive web pages from a computer networic. each web page being 
identified by a URL. 

b) searching each web page for a CCG phrase that includes a plurality of different 
types of CCG-data attrft>utes. 

c) extracting a plurality of said attributes from said phrase. 

d) from each extracted attribute, deriving an attrilMite name and a related attribute 
value. 

) detemnining the type of said extracted attribute and said attribute value by 
reference to said attribute name. 

f) relating said type of attrbute value so detemnined to a conesponding type of 
database property value. 

g) relating ttie URL of said web page to an other type of database property value. 

h) writing said derived attribute value to the database property value of said 
detennined corresponding type bi a set of associated property values, and 

0 writhg ttie URL of said web page to a database property value of said other type 
in said set of associated property values. 

A computer implemented method of bi;^ing a database which comprises seta of 
assodated property values wherein each set includes at least two property values of 
different types, the property values being any of dassification values, contact values, 
geographic location values, hereviafter ooDecfively refened to as CCG-data, the mettiod 
comprising ttie stepson 

a) retrievmg successive web pages from a computer networi(, each web page being 
identified by a URL, 

b) searching each web pag for a CCG phrase tttat indudes at least one type of 
CCG-data attribute, 

c) extracting at teast one said attribute from said phrase. 

d) from each exbacted attribute, d riving an atttibute name and a rotated attrtt>ute 
vahi , 
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e) d^milnir^ the tfpe of said extracted atbibute and said attribute valu by 
reference to said attribute name. 

database property valu . ^ » /k- 

Si toanothertypeofdatabasepropertyvahie. 
^ S^i^"* to ttie database pn,p4^^ 

D SS^Sf .Srr^'S'^ ^ ^ setof associated property values, and 

2! f *° * P~P«rty value of said other type 

m said set of associated property values. 

tJ^^ Jn^Ptemented method of building a database which comprises sets of 

^^^^1^!?^ '^'^ ^ ^ '"^"^^ at least two property values of 
different types, the property values being any of classification values, contact values 

a) retrieving successive web pages from a computer networic. 

c) extracting a plurality of said attributes from said phrase 

Shi^* ^"^"^ ®" ^***"to name and a related attribute 

^ *^ ^"tod attribute and said attribute value by 

reference to said attribute name. 

^ SSL"" ^ of attribute value so detemiined to a corresponding type of 
database property value, and » jk^ 

"H^^ value to the database property value of said 

detennined corresponding type in a set of associated property values. 

A cornputer implemented method of finding references to web pages posted on 
computer networic the method using a database comprising sets of aSSSt^ZeS 
^"^r^^"^ ""^ ^ classilication J. contact ^^e^rSZ 
^?^^V^' »nectively refened to as CCGKlata. and URL SSs 

the method comprising the steps of: wwenw». 

* "^"^ ^"^^^ fro"* a computer 

query neia name, 

^ SS!L!f" ^ J! expression so detemiined to one of the 
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i) locating database property values of said determined corresponding type which 
return a true value when tested against said query value using said query 
relational operator. 

j) xtracting from said database a list of the URL references associated with the so 
located database property values, 

A computer implemented method of finding sets of associated datat>ase property values 
th method using a database comprising sets of associated property values wherein 
each set includes at least two property values of difFerent types* the property values 
being any of classification values, contact values, geographic values, hereinafter 
collectively referred to as CCG-data, the method comprising the steps of: 

a) receiving a query phrase including query relational expressions from a computer 
networic, 

b) parsing said query phrase and extracting each of said query relational expressions 
included therein, 

c) from each extracted query relational expression, deriving a query field name, 

d) determining the type of said query relational expression by reference to its derived 
query field name, 

e) relating said type of query relational expression so determined to one of the 
following query relational expression types: CC&data type, other type, 

f) provided said query relational expnsssion is a CCG-data type, deriving a query 
relational operator and query value related to its query field name from said query 
relational expression. 

g) detenmining the type of said query value by reference to said query field name, 

h) relating said type of query value so detemnined to a corresponding type of 
database property value. 

i) locating database property values of said determined conesponding type which 
retum a true value when tested against said query value using said query 
relational operator. 

j) exbacting from said database sets of associated database property values 
as s ociated with the so located database property values. 

A method of displaying a web page comprismg at least one HTML encoded CC6 
phrase, the method comprising the steps of: 

a) retrieving a web page from a computer network. 

b) parsing said retrieved web page to locate an HTML code indicative of the start of a 
CCG phrase, 

c) parsing said located CCG phrase and extracting successive CCG attributes 
contained therein until an HTML code indicative of the end of said CCG phrase is 
found, 

d) from each extracted attrbute, deriving an attribute name, 

e) drtermming the type of said extracted attrttHite by reference to its derived attribute 
name, 

0 reiatmg said type of attribute so determoied to one of the following attrbute types: 
database control, display control, CCG-data, 

g) provided said extracted attribute is not a datat>ase control type, deriving an 
attriHJte value related to its attrSmte name from said extracted attrftHJte, 

h) determirung the type of said attribute value by refierenoe to said attrb^ 

0 relating said type of attrSnile value so detennined to a conesponding type of 
parameter of a display-devioe^control-program. 
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i "^"^ ^ ^ Paiameter. and 

k; Where said type of attribute is a nrruHo4» 

•o^tW and paZ^^aJeTri^ 

^^^^^^^ y luccessive values of CCG-dala of the CCG phrase ais 



ABSTRACT 

A system for automatically creating databases containing industiy. senrice. product and 
sul)ject classification data, contact data, geographic location data (CCG^ 
pages from HTML. XML or SGML encoded web pages posted on computer netwofte such as 
5 the Internet or Intranets. The web pages containing HTML XML or SGML encoded 
database update controls and web browser display controls are created and modified Ijy using 

simple text editors. HTML. XML or SGML editors or purpose built editors. The CC6 databases 
may be searched for references (URLs) to web pages by use of enquiries which refm 
r more ofthe Hems of the CCG-data. AHematively. enquiries relBrencing the CCG-data in the 
10 databases may supply contact data without web page references. Data duplication and 
coordination is reduced by including in the web page CC&dala display controls which are 
used tv web browsers to fbrmat for display the same data that is used to automatically update 
thedatateses! 



